In the last decade, technology-based community science programs, such as eBird and iNaturalist, have vastly expanded our understanding of species distributions. Data collected by program participants have altered range maps and revealed new insights on species’ niches and the phenology of organisms (i.e., the timing of life history events). Concurrently, such programs also open opportunities to engage directly in the practice of collecting meaningful scientific data.
Despite the opportunities that such projects offer for scientists and the general public, the data often expose bias – both in terms of sampling and among community science participants. For example, observations are often skewed towards areas with higher human population densities, public parks, and uncommon species. Concurrently, socioeconomic status is a key indicator of participation in these programs. This biases the distribution of observations but can also offer insights into how science can be made more accessible to the public.
In this exercise, you will explore data collected by iNaturalist participants in Washington, DC in May of 2022. You will use raster and point data to explore their observations.
When I create an R Markdown file to communicate a coding process, I usually work in a .R script file and copy-and-paste the code into the R Markdown document at the end.
The points allotted for each question are provided in highlighted red bold text (e.g., [1.0]) within the question itself. When applicable, total points for a question may represent the sum of individually graded components, which are provided in red text (e.g., [1.0]).
Points may be deducted from each question’s total:
eval = TRUE (but see below);Note: The maximum deduction is the total points value for a given question
Pay careful attention to the format of your code – not following the rules above will cost you lots of little points (and some big ones) that can really add up!
In addition to points allotted per question, you must ensure that your R Markdown document runs out-of-the-box [25% off of the total grade] – in other words, the document will knit without error. Some tips for doing so:
source() or read_csv()).setwd() in your code
(Never use setwd()!)eval = FALSE in the options for that chunk. Otherwise,
ensure all of your code chunks options are
eval = TRUE. Click the blue button below
to view the functions that you may use in completing this problem set.
Make sure that you know what each function does (use
?[function name] if you do not). Do not use any functions
outside of this list!
In this assignment, you may use only the following R
functions (Note: If you are unclear on what a given function does,
use ? to view the help file!):
=<-$~/cfile.pathlibrarylist.filesmeannamestolowerinner_joinmutatenpullsummarize%>%mapset_namesst_areast_centroidst_crsst_differencest_joinst_readst_sfst_transformst_unioncropextractglobalmaskrastvectas_tibble+tmap_modetm_dotstm_shapetm_polygonstm_rastertmap_modeset_unitsNote: The packages dplyr, ggplot2, magrittr, readr, rlang, and
tibble are all part of the tidyverse and are loaded with
library(tidyverse).
1. [0.5] Save and knit this document:
problem_set_5_Evans_Brian.rmd2. [0.5] Load the sf,
tmap, and tidyverse libraries.
library(sf)
library(tmap)
library(tidyverse)
3. [0.5] Set the tmap mode to “view” for the entire document.
tmap_mode('view')
4. [1.0] Modify the code below to such
that it reads in the geojson shapefiles inat_2022_may,
dc_surface_water, and dc_census [0.25] and assign shapes to your
global environment as a single list object. In doing so:
inat, water, and census to the
individual list items;shapes <-
c(
inat = 'inat_2022_may.geojson',
water = 'dc_surface_water.geojson',
census = 'dc_census.geojson') %>%
map(
~ file.path('data/raw/shapefiles', .x) %>%
st_read() %>%
set_names(
names(.) %>%
tolower()) %>%
st_transform(crs = 5070))
Note: The income variable is the median income per
census tract.
5. [1.0] Using shapes$census:
dc_land.dc_land_water <-
shapes %>%
map(~ .x %>%
st_union() %>%
st_sf())
dc_land <-
dc_land_water$census %>%
st_difference(dc_land_water$water)
rm(dc_land_water)
6. [1.0] Using list.files to list the rasters in the folder ‘data/rasters_problem_set_5’ and a purrr::map function, modify the code below:
rasters <-
list.files("data/rasters_problem_set_5",
full.names = TRUE) %>%
set_names(c("canopy_cover",
"impervious_surface")) %>%
purrr::map(~ terra::rast(.x) %>%
terra::crop(shapes$census) %>%
terra::mask(shapes$census)) %>%
terra::rast()
## Warning: [mask] CRS do not match
## Warning: [rast] CRS do not match
Note: You may see a warning that the CRS of the two raster files are not equivalent – don’t worry, they are!
7. [1.0] Calculate the mean canopy cover of the land area in Washington DC.
dc_land %>%
terra::vect() %>%
terra::extract(
rasters$canopy_cover,
.,
mean,
na.rm = TRUE) %>%
pull()
## [1] 17.43166
8. [1.5] Generate a tmap [0.5] of mean canopy cover by census tract [1.0].
shapes$census %>%
terra::vect() %>%
terra::extract(
rasters$canopy_cover,
.,
mean,
na.rm = TRUE) %>%
inner_join(
shapes$census %>%
mutate(ID = row_number()), .) %>%
tm_shape() +
tm_polygons(col = "canopy_cover",
alpha = .6,
palette = "Greens")
9. [1.5] Generate a choropleth tmap [0.5] that displays the density [0.5] of iNaturalist observations per census tract (number of observations per unit area; area unit = km^2) [0.5].
shapes$inat %>%
st_join(shapes$census, .) %>%
as_tibble() %>%
summarize(n = n(),
.by = geoid) %>%
inner_join(shapes$census, .) %>%
mutate(area =
st_area(.) %>%
units::set_units("km^2"),
density = n / area) %>%
tm_shape() +
tm_polygons(
title = 'Observations',
style = 'kmeans',
col = 'density',
palette = 'viridis')
10. [1.5] Generate a tmap that
displays (with layers ordered as follows):
inat_dense <-
shapes$inat %>%
st_join(shapes$census, .) %>%
as_tibble() %>%
summarize(n = n(),
.by = geoid) %>%
inner_join(shapes$census, .) %>%
mutate(area =
st_area(.) %>%
units::set_units("km^2"),
density = n / area)
# tm_basemap(
# c('Esri.WorldTopoMap',
# 'OpenStreetMap',
# 'Esri.WorldImagery')) +
tm_shape(rasters$canopy_cover) +
tm_raster(alpha = .6,
palette = "Greens") +
tm_shape(rasters$impervious_surface) +
tm_raster(alpha = .6,
palette = "OrRd") +
tm_shape(shapes$census) +
tm_polygons(col = 'income',
palette = 'viridis') +
tm_shape(inat_dense) +
tm_dots(col = 'density',
palette = 'magma')
Extra credit [0.5] : Use a custom color palette for each of the layers in Question 10!